Models of relation classification
(Generalized) siamese architecture Bordes et al. (2011)Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011, April). Learning Structured Embeddings of Knowledge Bases. In AAAI. generalized siamese architecture to learn relations in knowledge bases. This approach is also called distance model. The main problem with this model is that the parameters of the two entity vectors do not interact with each other, they are independently mapped to a common space.Socher, R., Chen, D., Manning, C. D., & Ng, A. Y. (2013). Reasoning With Neural Tensor Networks for Knowledge Base Completion. Advances in Neural Information Processing Systems, 926–934. Single Layer Model The second model tries to alleviate the problems of the distance model by connecting the entity vectors implicitly through the nonlinearity of a standard, single layer neural network. The scoring function has the following form: g(e_1, R, e_2) = u^\intercal_Rf (W_{R,1}e_1 + W_{R,2} e_2) = u^\intercal_R f \left(W_{R,1}W_{R,2}\begin{bmatrix}e_1\\e_2\end{bmatrix}\right), where f = \tanh, W_{R,1},W_{R,2} \in R^{k \times d} and u_R \in R^{k \times 1} are the parameters of relation R’s scoring function. While this is an improvement over the distance model, the non-linearity only provides a weak interaction between the two entity vectors at the expense of a harder optimization problem. Collobert andWeston 20 trained a similar model to learn word vector representations using words in their context. This model is a special case of the tensor neural network if the tensor is set to 0. Hadamard Model This model was introduced by Bordes et al. 10 and tackles the issue of weak entity vector interaction through multiple matrix products followed by Hadamard products. It is different to the other models in our comparison in that it represents each relation simply as a single vector that interacts with the entity vectors through several linear products all of which are parame- terized by the same parameters. The scoring function is as follows: g(e_1,R, e_2) = (W_1 e_1 \otimes W_{rel,1} e_R + b_1)^\intercal (W_2 e_2 \otimes W_{rel,2} e_R + b_2) where W_1, W_{rel,1}, W_2,W_{rel,2} \in R^{d \times d} and b_1, b_2 \in R^{d \times 1} are parameters that are shared by all relations. The only relation specific parameter is e_R . While this allows the model to treat relational words and entity words the same way, we show in our experiments that giving each relationship its own matrix operators results in improved performance. However, the bilinear form between entity vectors is by itself desirable. Bilinear Model Jenatton et al. (2012)R. Jenatton, N. Le Roux, A. Bordes, and G. Obozinski. A latent factor model for highly multi-relational data. In NIPS, 2012., Sutskever et al. (2009)I. Sutskever, R. Salakhutdinov, and J. B. Tenenbaum. Modelling relational data using Bayesian clustered tensor factorization. In NIPS, 2009. fixes the issue of weak entity vector interaction through a relation-specific bilinear form. The scoring function is as follows: g(e_1, R, e_2) = e^\intercal_1 W_R e_2, where W_R \in R^{d \times d} are the only parameters of relation R’s scoring function. This is a big improvement over the two previous models as it incorporates the interaction of two entity vectors in a simple and efficient way. However, the model is now restricted in terms of expressive power and number of parameters by the word vectors. The bilinear form can only model linear interactions and is not able to fit more complex scoring functions. This model is a special case of neural tensor network with V_R = 0, b_R = 0, k = 1, f = identity. In comparison to bilinear models, the neural tensor has much more expressive power which will be useful especially for larger databases. For smaller datasets the number of slices could be reduced or even vary between relations. Neural tensor network Socher et al. (2013) References Category:Statistical relational learning Category:Logic